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ABSTRACT 

This paper introduces a model for describing 
outliers (observations which are extreme in some sense or violate the 
apparent pattern of other observations) in linear regression which 
can be viewed as a mixture of a quadratic and a linear regression. 

The maximum likelihood estimators of the parameters in the model are 
derived and their asymptotic properties discussed. Small samole 
behavior of the model and robustness to inaccurate specification of 
the mixing parameter were investigated using Monte Carlo technioues. 
^he asymptotic properties provide reasonable indications of behavior 
for n as small as 21 and the procedure appeaLS quite robust to the 
inaccurate specification of the mixing parameter. Building models to 
describe outliers and estimating their parameters provides an 
interesting alternative to procedures of outlier detection followed 
by ordinary least squares procedures. (Author) 
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Introductory Statement 



The central mission of the Stanford Center for Research and Develop- 
ment in Teaching is to contribute to the improvement of teaching in 
American schools. Given the urgency of the times, technological develop- 
ments, and advances in knowledge from the behavioral sciences about teach- 
ing and learning, the Center works on the assumption that a fundamental 
reformulation of the future role of the teacher will take place. The 
Center r s mission is to specify as clearly, and on as empirical a basis as 
possible, the direction of that reformulation, to help shape it, to fashion 
and validate programs for training and retraining teachers in accordance 
with it, and to develop and test materials and procedures for use in these 
new training programs. 

The Center is at work in three interrelated problem areas: 

(a) Heuristic Teaching , which aims at promoting self-motivated and sus- 
tained inquiry in students, emphasizes affective as well as cognitive 
processes, and places a high premium upon the uniqueress of each pupil, 
teacher, and learning situation; (b) The Environment for Teachin g, which 
aims at making schools more flexible so that pupils, teachers, and learn- 
ing materials can be brought together in ways that take account of their 
many differences; and (c) Teaching Students from Low-Income Areas , which 
aims to determine whether more heuristically oriented teachers and more 
open kinds of schools can and should be developed to improve the education 
of those currently labled as the poor and the disadvantaged. 

This paper grew out of the activities of the Center* o Methodology 
Unit and represents a methodological development generated in answer 
to problems encountered in the reanalysis of the Rosenthal-Jacobson 
Pygmalion in the Classroom study. Such data analyses problems pose 
frequent difficulties in data gathered by Center projects. 
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Abstract 



This paper introduces a model for describing outliers in linear 
regression which can be viewed as a mixture of a quadratic and a linear 
regression. The maximum likelihood estimators of the parameters in the 
model are derived and tneir asymptotic properties discussed. Small 
sample behavior of the model and robustness to inaccurate specification 
of the mixing parameter were investigated using Monte Carlo techniques. 
The asymptotic properties provide reasonable indications of behavior for 
n as small as 21 and the procedure appears quite robust to the in- 
accurate specification of the mixing parameter. Building models to de- 
scribe outliers and estimating their parameters provides an interesting 
alternative to procedures of outlier detection followed by ordinary 
least squares procedures. 
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INTRODUCTION 



The standard linear regression model for fixed x r s is given by 



y ± ~ Ot + 3(x^ ■* x) *4* I — 1, 2, • • • i n 



( 1 ) 



where 



Cov (e,e.) = 0 



i * i 



( 2 ) 



e ± ^ N(0,o 2 ) 



(3) 



Occasionally the data may contain observations inconsistent with the 
apparent pattern of the rest of the observation'. Such aberrant 
observations or outliers could lead in extreme cases to rejection of 
(1) as the form of the regression relationship. Even if (1) is assumed, 
estimators of a and 0 by standard least squares procedures based on 
assumptions (2) and (3) may have unsatisfactory distributional properties 
such as large bias and large variance in the presence of outliers. 

In this paper we formulate some models to describe outliers in 
regression problems, give a brief review of previous work in this area, 
and propose a particular model suggested by some real data. Then we 
derive the maximum likelihood estimators of the parameters in the model 
and their asymptotic properties. Monte Carlo investigations to determine 
the small sample properties of rhe maximum likelihood estimators and 
their robustness to inaccurate specification of the mixing parameter are 
reported. Large sample and small sample comparisons under our quadratic 



2 



outlier model of the maximum likelihood estimators and the ordinary 
least squares estimators for linear regression are discussed* Finally 
the model is applied to some data obtained in the Rosenthal-Jacobson 
teacher expectancy study. 

Some Models for Outliers in Linear Regression 

We begin by outlining some simple models for outliers in linear 
regression problems suggested by those proposed in the single sample 
case (see, e.g., Grubbs, 1969, or Dixon, 1962). Retaining assumptions 
(1) and (2), alternatives to (3) which generate outliers are models with 
skewed error distributions such as; 

e * (1 - y)N(0,o 2 ) + yN(X,o 2 ) (4) 



e * (1 - y)N(0,o 2 ) + yN(X(x),cj 2 ) (5) 

e * (1 - y(x))N(0,o 2 ) + Y(x)N(X,c 2 ) <6) 

C * (1 - y(x))N(0,o 2 ) + Y(x)H(X(x),o 2 ) (7) 



models like (4) or (5) in which it is known that 

2 

n-k of the e observations are N(0,o ) and that 

2 2 

k of the observations are N(X,0 ) or N(X(x),0 ). (8) 



O 



With assumptions 1 and 2, error model (4) describes a process in which there 
is a constant probability that a y observation will be biased by an 
amount X . In model (5) the probability is constant but the amount of 
>^as depends on x , In model (6) the bias is constant but the probability 
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of a biased observation depends on x . Model (7) is a combination of (5) 
and (6). 

We can propose analogous models with symmetric error distributions 



for the scale contaminated case: 

e 'v (1 - y)N(0,o 2 ) + yN(0,X 2 o 2 ) (9) 

e ^ (1 - y)N(0,o 2 ) + yN(0,X 2 (x)o 2 ) (10) 

e ^ (1 - y(x))N(0,o 2 ) + y(x)N(0,X 2 o 2 ) (11) 

e * (1 - y(x))N(0,o 2 ) + y(x)N(0,X 2 o 2 ) (12) 



models like (9) or (10) in which it is known that 

2 

n-k of the t observations are N(0,o ) and k 

are N(0,X 2 o 2 ) or N(0,X 2 (x)o 2 ) . (13) 

2 2 

models like (9) where X a follows some distribution* (14) 

Model (9) describes a process in which occasional y observations come 
from a population with a larger variance* In model (10) the variance of 
aberrant y observations depends on x • In model (II) the probability 
that a y observation has a larger variance depends on x * Model (12) 
is a combination of (10) and (11). 

These models with X(x) > y(x) suitably defined can describe a 




wide variety of cases* 
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Review of Literature 

We define an outlier as an observation which is extreme in some 
sense or violates the apparent pattern of the other observations. Most 
of the statistical literature on outliers Is concerned with two basic 
problems: detection of cutliers and estimation of parameters in the 

presence of outliers. 

There are several approaches to the detection problem when we have 
two variables. Let y and x denote the two variables and suppose at 
first that both y and x are random variables* For bivariate and mul- 
tivariate models where x or y are distributed as in (8) or (13) with 
at most one outlier, a test statistic for outlier detection which maxi- 
mizes the probability of making the correct decision has been discussed; 
see Ferguson (1961b), Karlin and Truax (1960). When more than one outlier 
is suspected there is little information on how to proceed* One technique 
is to apply the method described above repeatedly. Another is to have 
some prior information that particular observations are suspect and, then, 
possibly apply teats developed by Wilks (1963) that generalize Grubbs 
(1950). Still another alternative is to treat each variable separately 
and apply univariate single sample techniques. 

When x is the independent variable and is measured without error 
and the regression of y on x is given by (1) where are distri- 

buted as in one of models (4)-(14), a number of suggestions for locating 
possible outliers have been made in the literature. One suggestion is 
to compute the maximum squared studentlzed residual and reject the obser- 
vation corresponding to this residual if it is significantly large. 

O 




Clearly, this procedure has its difficulties; see Mickey et al. (1967). 
Another suggestion made by Mickey et al. (1967) is to find the single 
observation whose deletion causes the greatest reduction in the sum of 
squared residuals. Having found and deleted this observation, the pro- 
cedure finds the next observation whose deletion reduces the sura of squared 
residuals as much as possible. No theory for the procedure is available. 
The procedure can be carried out on the computer by using a standard step- 
wise regression program such as BMD02R. The regression relation must have 
a known form (e.g., linear); but no distributional assumptions need to be 
made for x and the distribution of y may follow any of the models 
outlined above. 

The problem of detecting outliers in the regression setup requires 
much more work. Little theoretical guidance for consumers of statistical 
regression analysis is available. A very interesting approach to outliers 
in calibration analysis is suggested by Youden (see Barnett, 1965). 

A review of how to estimate a, 3 and a measure of their variability 
in the general case becomes a rather large problem. In our brief review 
we will restrict consideration to the model defined by (1), (2) and some 
choice of (4)-(13). So far the only work appears to be for symmetric 
error models such as those in (9) — (13) . 

The main lines of attack on the problem of choosing estimators for 
a, 3 are essentially generalizations of the approaches to the single 
sample problem. Anscombe's (1967) paper applies when the are a 

random sample from a t distribution or a distribution in some sense 
well-approximated by a t (an example of model (14)). Anscombe indicates 
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that minimization of the Huber metric may be used and, generally, will 
give estimates "close" to those obtained by his Bayesian approach using 
the t as the basic data distribution. If the are distributed as a 

scale contaminated compound normal distribution (model (9)), then the 
methods of Box and Tiao (1968) may be extended to derive estimators for 
a, 8 . Anscombe*s (1960a) paper is useful when we want to test for skew- 
ness, kurtosis or heteroscedasticity . A few suggestions on estimation 
procedures for a, 8 based on ranks or signs have been investigated, see 

Hood (1950), Adichie (1967a), Sen (1968), Theil (1950). 

Estimators for a, 8 may also be deduced by first screening the 
data for outliers by one of the techniques suggested in the section on 
detection and then estimating a, 8 by minimizing some metric. Not much 
is known about this approach except the paper by Anscombe and Barron 
(1966) for estimating the population mean from a single sample. 

A QUADRATIC OUTLIER MODEL 

Our interest in the problem of outliers in linear regression 
problems was kindled by two examples of data problems in which aberrant 
observations seemed to occur only on one side of the regression line and 
at one extreme of the x T s (see Figures 5, 6, 7). Thus we were led to 
consideration of error models (4)-(7). Model (5) seemed to describe 
best our impression that outliers were increasingly far from the line 
for more extreme x and we were led to an examination of model (5) with a 
reasonable specification of A(x) . 

This paper, then, is concerned with estimation of the parameters in 
the quadratic outlier model (15)* Since the adoption of such a model 
Implies the occurrence of a similar pattern of outliers across several 



0 
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sets of data and the model may generate many nonextreme "aberrant 
observations, it seems more profitable to concentrate our efforts 
directly on parameter estimation rather than on any two-stage detection 
of outliers and parameter estimation procedures. 

Quadratic outlier model: 

y i B OL + g(x i - x) + i = 1, 2, . . . , n 

Cov (e if = 0 i { i 

e ^ (1 - y)N(0,o 2 ) + yN(A(x),o 2 ) (15) 

2 

X(x) ** c (x - m) 

m and y known 

x * 8 fixed 

2 

We choose X(x) « c (x^ - m) with m known, as a simple function 
which describes our impression of the data. We assume that the general 
pattern of outliers, and thus m , is known. Model (15) describes a 
bias which increases rapidly for extreme x . By defining m as x min i 
x , x max and forcing c to be positive or negative we can obtain the 
bias patterns shown in Figure 1. 

The assumption of known y is not so restrictive as might at first 




appear. The literature indicates that its accurate estimation is 
difficult and our own results indicate that incorrect specification is 
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not serious. The problem of estimating the parameters of a mixture of 
distributions has been around a long time. Pearson (1894) discussed 
estimates based on the method of moments. Rao (1952) reviewed this 
approach but pointed out that the estimate of the proportion of the mix- 
ture has a large variance and its estimation requires very large samples. 
Hill (1963), using seme expansions of the information for the 
estimation of the mixing probability y for two exponential distribu- 
tions, shows that unless the mixed distributions are very well separated, 
extremely large samples are needed even for moderate precision when all 
other parameters are known. Larger samples are needed if the other 

parameters must be estimated as well. Box and Tiao (1968) exploring 

2 2 2 

the estimation of 0 in the mixture (1 - y)N(0,a ) + yN(0,k a ) by 
Bayesian methods assuming k and y known and then using various values 
of k and y showed that the estimator of 0 is not unduly sensitive 
to changes in k or y in a reasonable range. 



/ 
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Maximum Likelihood Estimators 

Assuming m specified, y known, and the x's fixed, the maximum 

2 

likelihood estimators of a, 0, c, a are given by the following 
equations: 



a 



A 





W 



i 



3 



E(x i - x)(y i - y) E(x 1 - x) (x t - m) 

E(x. 



E(x 1 - x) 2 



x ) 2 



( 16 ) 



( 17 ) 



E( yi - a 



3(x. 



x)) 2 



" n E(X 1 " m) W i 



( 18 ) 



A 

c 



E(x t - m) 2 (y A - a - ?(x 1 - x)) v ± 



E(x. 



*4 

m) w. 



( 19 ) 



where 



w. 



-1 



y + (l-y)e 



-4 (-2c (x. - «n) 2 (y. - a 
2o 1 1 



§(x r x)) + £ 2 (x r m) 4 ] 
( 20 ) 



A Fortran IV computer program to obtain iterative solutions to these 
equations was written. 



Asymptotic Properties of the Maximum Likelihood Estimators 

Asymptotically the maximum likelihood estimators of a , $ , c , o 

have a multivariste normal distribution* Thst is, for fixed x's in the 
interval (a, b) the estimators 




/a (a^-o) , /a (3^-8) » (c^-c) , /i (oj^-o 2 ) 
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have an asymptotic multivariate normal distribution with variance- 

covariance matrix V given by n H * where M is the information 

4 

matrix. Letting A=oM , the terms in A are given by the following 
formulas; 



a ll ■ 


■ na 2 - y(l-y)c 2 E(x 1 -m) i 'l 1 


a 12 ■ 


* -y(l-y)c 2 Efrj-mJ^Xj-x)^ 


a 13 ‘ 


• ya 2 E (x^-m) 2 + y(l-y)cE(xj-ai) 4 tJj-cCXj-m) 2 !^] 


a l4 ' 


• c E^-m) (-J. + 2 (Xj-m) I 1 ) 

a 


a 22 ' 


■ o 2 E(x r 7) 2 - y(l-y)c 2 E(x i -B) 4 (x 1 -x) 2 I 1 


a 23 ' 


■ o 2 yj;(x 1 -x)(x 1 -B) 2 + y(l-y)cj:(x i -x)(x i -m) 4 (J 1 -c(x 1 -m) 2 I 1 ] 


a 24 ' 


* C 2 (Xj-x) (Xj-b) 4 (-J + 2 (x i -ffl) I 1 ) 

a 


a 33 ' 


■ yo 2 E(x 1 -m) 4 - y(l-y)E(x 1 -m) 4 (K^c^-m) 2 J 1 +c 2 (x 1 -b) 4 I 1 ) 


a 34 ' 


■ cE^-b) (* t - -gc^-tO + 2 0^-n) Ij) 

0 


*44 ' 


2 

■ f - C 2 I(K t -«) 4 - c(x i -«) 2 + | (* 1 -«) 4 X 1 ) 


ERIC where 
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1 -z 2 /2 cj 2 

I. = ■— / e £ (z ) dz 

1 /zita 11 



2 2 

i /2a 

/ z. e ' f (z ) dz 

•• ffia 1 11 



2 2 

1 2 - z i 

K - / k 4 e £ (z ) dz 

1 /2io 1 11 



and 

-c(Xj-m) _ c ( x -m) 2 ] 

2 1 

f(z 1 ) ■ [(l“Y)e 2 ° + y ]" 1 • 

To demonstrate the way in which the asymptotic variant ds vary with 
the parameters y, f and to gain an lJea of variances we might expect 
in small samples we evaluated the formulas for the asymptotic variances 
of a, 8, c, at several values of n (these numbers are then taken 
from M * ). As we shsll see in later sections these asymptotic 
formulas for the variances may provide very good approximations to the 
actual variances for n’s as small as 21, 

Asymptotic formulas for the variances were computed for some 
illustrative cases. We set a * 0.0 , 8 - 1.0 , m ■ • and c 

positive. The x's r?r e equally spaced from -1.0 to +1.0 with one y 
observation at each x • The asymptotic variances were evaluated for 
sample sices, n , of 15, 21, 41 and y of .10, .20, .30, .40. Values 
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2 0 2 
of O were chosen so that — j = .50, .20; that is, values of 0 were 

o 

y 

chosen to produce relationships between x and y accounting for 20%, 

50% of the variation in y representing a range from fair to good fit 

of the line. Values of c were chosen so that the mean of the largest 

2 

possible residuals or f » c(x - x , ) would take on values of 0 , 

max mm 

0 , 2a , 3a , 4a , 5a , 6a , 7a , 8a . The obtained variances for , 

9 an< * SfL are 8 iven in Tab ^ es 1-3. 

The asymptotic formula for the variance of can be written as 

2 

0 times a function of y, f, n and the spacing of the x's; it does not 
depend on a, 3, or A = x - x J , the scaling of the x‘s. Therefore 
in Table 1 in which x's were equally spaced for all calculations, we show 



var °S 41 

5 ^ ss K(y, f, n) . 

a 

A 

Examination of Table 1 shows that the asymptotic variance of oi^ 
decreases monotonically from a maximum ".ue at c ■ 0 but remains 
relatively stable across a wide range of f values from 2a to 8 o . The 

A 

variance of increases with y and decreases with n . 

The asymptotic variance of can be written as ^ times a 

function of y, f, n and the spacing of the x f s; it does not depend on 

A 

a or 3 . Table 2 shows that the change in the variance of 0^ with 
f, y, n is very similar to that for variance . 

o 2 

The asymptotic variance of can be written as — y times a 

function of y,f, n and the spacing of the x’s. The asymptotic 
variable of decreases rapidly as f increases until about A a or 
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50 at which it begins to approach an asymptote. The variance of c^ L 
decreases as n increases but it also decreases as y increases. For 
larger y , the effective sample size for the estimation of c increases. 



Table 1 

Asymptotic Variance Formula for cc^ Evaluated for Equally Spaced x r s 

var V 



Y - 


• 20 


c(x 

max 


- X ) 2 

min' 










n 


0 a 


20 


3o 


4o 


5o 6a 


7a 


8a 


21 


.8798 *0963 


.0659 


.0626 


.0620 


.0613 .0607 


.0602 


.0599 



c(x 

max 


X ain >2 " 60 








n Y 


.10 


.20 


.30 


.40 


n 

15 


.0760 


.0847 


.0946 


.1067 


21 


.0544 


.0607 


.0679 


.0767 


41 


.0279 


.0312 


.0350 


.0396 
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Table 2 

A 

Asymptotic Variance of (3^ f° r Equally Spaced x's 



var ft 



ML 



Y - .20 
n 0 



c(x - X . )' 
max min' 



2c 



3o 



40 



5c 



6 a 



70 



21 1.912 .2346 .1738 



.1684 



.1623 .1552 



.1505 



,1479 



c(x - x . ) » 60 

max min 



Y 


.10 


.20 


.30 


.40 


n 

15 


.1886 


.2024 


.2194 


.2414 


21 


.1402 


.1505 


.1631 


.1795 


41 


.0754 


.0809 


.0878 


.0966 



o 

ERIC 



8o 

.1466 
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Table 3 

A 

Asymptotic Variance of for Equally Spaced x ! s 



var £ ml 



a 



2 



Y 

n 



.20 

0 



c(x - X , ) 
max min 



2o 



3a 



4a 



5a 



6a 



7a 



21 11.14 .6545 .2249 .1490 .1204 .1055 .0974 .0931 



c(x 

max 


X min )2 








Y 


.10 


.20 


.30 


.40 


n 

15 


.2502 


.1315 


.0963 


.0823 


21 


.1860 


.0974 


.0713 


.0609 


41 


.1000 


.0522 


.0381 


.0325 







8a 



.0907 
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SMALL SAMPLE PROPERTIES OF THE MAXIMUM LIKELIHOOD ESTIMATORS 

We undertook a Monte Carlo study to investigate the properties of 

the maximum likelihood estimators of a, 6, c in small samples. We set 

a = 0.0, 3 * 1.0 and m = x throughout. Eight parameter sets 

2 

specifying the values of n, y t> o , c and the spacing of the x's were 
defined and used to generate y samples, see Table 4. For each parameter 
set, evaluation of the properties of the estimators were made for several 
choices of , the value of y actually used in estimation. The basic 

parameter set involved 21 equally spaced x's from 1 to 21, y^ = .20 , 

2 2 2 
O ■ 36 , and c(x - x . ) * 6o . We chose O =36 to obtain a 

max min 

representative situation in which x predicts 50% of the variance in y . 

The values f ■ 6o and Y T “ *20 were chosen because unless outliers are 

occasionally obvious by inspection it is unlikely that an outlier model 

would be applied (this is also approximately the value observed in the 

RJ data). Tne variations from this basic set of parameters Include cases 
2 

in which O is reduced, c is reduced, n is reduced, the x '3 follow 
a normal distribution, n is increased, and y T is varied. 

For each farameter set and choice of y^ , 200 random samples of y 
observations were generated using a random normal generator developed for 
the IBM 360 by Chen (1969). For some parameters, several sets of 200 
samples were generated. The maximum likelihood estimators were obtained 
for each sample and the observed means and variances of the estimators 
across the 200 samples were calculated. 
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Set 



1. Basic 



2 

2. Reduce 0 



3. Reduce c 



4. Reduce n 



5, Vaiy x f s 



6. Increase n 



7, Reduce y 



8, Increase 



O 




Table 4 



Parameter Sets 



a = 0.0 3 = 1,0 m = x 



min 

O 2 c (x -X ) 2 c Y — of sa 5E- le -g- 

* max mitr 'E y„ 



.01 .05 .10 .20 .30 .40 



21 .20 equally 36 60 .09 200 200 600 200 200 

space! 

1 to SI 



21 .20 equally 9 6a .045 200 200 400 200 200 

spaced 
1 to 21 



21 .20 equally 36 3a .045 200 200 200 200 200 

spaced 
1 to 21 



15 .20 equally 18.67 6a .1322 200 200 400 200 200 

spaced 
1 to 15 



15 .20 expected .85 6a .45 200 200 400 200 200 

normal 
order 
statis- 
tics 



41 .20 equally 140 6a .04437 200 200 400 200 200 

spaced 
1 to 41 



21 .05 equally 36 6a .09 200 200 200 20C 

spaced 
1 to 21 

21 .40 equally 36 6o .09 200 200 200 200 

spaced 
1 to 21 
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The initial estir.iates used in the iterative maximum likelihood 

^ A 2 

solutions were , and c was estimated from the largest 

residual from the least squares line in the appropriate quadrant. In 
general the Iterative solution converged to six significant digits in each 
estimator fairly rapidly* The procedure was automatically terminated 
after 100 iterations. Table 5 shows the number of iterations required 
for convergence for the basic parameter set with Y E 3 Y T * *20 and with 
y E - .40 , Yg = *05 . The median number of iterations was in the range 
20-29. The number of iterations required seems to increase somewhat as 
Y E increases. 

A A y\ 

Tables 6, 7, and 8 show the results for , 3^ , and f 

respectively. Part (a) of each table shows the ratio of the asymptotic 
2 

variance to o for each parameter set for several Y values. (Note 

2 

that this ratio is independent of o )♦ These figures have been scaled 

to allow comparisons with figures in Tables 1, 2, and 3 (i.e., they all corre- 
spond to calculations made for x ranging fron -1 to +1.)* Part (b) of 
each table shows the ratio of the observed variance using y e to the 
asymptotic variance calculated with y e • Part (c) of each table shows 
the ratio of the observed variance using y v to the asymptotic variance 
calculated with y^ . Part (d) of each table shows the observed bias 
(due to scale changes these figures are not necessarily comparable from 
row to row). Part (e) shows the ratio of the squared bias to the 
asymptotic variance calculated with Y^ • 

A guideline to the interpretation of the ratios between observed 
and asymptotic variances can be obtained by the following argument* If 

A 

an estimator 0 is normally distributed, the standard deviation of its 
estimated variance based on p samples is /Ffp var § . Thus the 




Table 5 



Number of Iterations Required for Convergence to Six 





Significant Digits 
Y e = .20 


in All Estimators for 
Y t - -20 
Y e - .40 


Basic Parameter Set 
Y e - .05 


No. of 
Iterations 


Frequency 


Frequency 


Frequency 


i - 


9 


1 


0 


2 


10 - 


19 


63 


25 


94 


20 - 


29 


82 


69 


53 


30 - 


39 


27 


33 


21 


40 - 


49 


7 


22 


14 


50 - 


59 


5 


13 


9 


60 - 


69 


2 


7 


1 


70 - 


79 


4 


8 


• * '4 

1 


80 - 


89 


0 


3 


4 


90 - 


99 


2 


4 


0 


100+ 




7 


16 


1 






200 


200 


200 
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✓X 

var 0 i ' * " 

standard deviation of tt is approximately /2/p . So for p *■ 200 

var 8 

we would expect the observed variances to be within +20% of the true 

variance; for p * 400 and p = 600 , the observed variances should be 

within +14% or +12% respectively. 

Behavior when y « y 
K 

The properties of are shown in Table 6. Note that the 

asymptotic variance of ia not strongly affected by c, y % or the 

spacing of the x's. For the parameter sets investigated here the actual 
variance is only 14% to 35% larger than the asymptotic variance when 
Yg ■ • The bias is generally positive but contributes less than 1% 

to the MSE 0^ . 

The asymptotic variance of depends more heavily on c and the 

spacing of the x’s than does the variance of • With the exception 

of two cases the observed variance is no more than 15% larger than the 
asymptotic variance. The bias is numerically quite small and makes a 
negligible contribution to MSE. 

The asymptotic variance of is fairly strongly affected by 

changes in the parameters, especially by changes in Y • The observed 
variance ia conaiderably larger than the asymptotic variance — about 2 to 
6 tinea larger for these case3. The bias is generally negative 
indicating that c ia underestimated on the average. The contribution 
of biaa to MSE rangea from 4 to 22% except for the case where y t ■ *05 ♦ 

How nearly normal are the distributions of a, 8, c in small samples? 

for the 600 samples 
generated by basic parameter aet with Y • *20 n - 21 are shown in 



Hiotograma of the distributions of 




21 



Table 6 
Properties of 

^ 2 

a) Ratio of asymptotic variance of to O 

Parameter set Y 



O 

ERiC 



No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.2 


.0485 


.0513 


.0544 


.0607 


.0679 


.0767 


2 


21 


.2 


.0485 


.0513 


.0544 


.0607 


.0679 


.0767 


3 


21 


.2 


.0486 


.0515 


.0550 


.0626 


.0716 


.0828 


4 


15 


.2 


.0679 


.0717 


.0760 


.0847 


.0946 


.1067 


5 


15 


.2 


.0681 


.0726 


.0777 


.0882 


.0998 


.1137 


6 


41 


.2 


.0248 


.0263 


.0279 


.0312 


.0350 


.0396 


7 


21 


.05 


.0485 


.0513 


.0544 


.0607 


.0679 


.0767 


8 


21 


.4 


.0485 


.0513 


.0544 


.0607 


.0679 


.0767 


b) 


Ratio 


of observed variance using y E 


to asymptotic variance with 


Parameter 


set 








y e 






No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.2 




1.37 


1.62 


1.21 


1.06 


1.07 


2 


21 


.2 




2.02 


1.38 


1.14 


1.24 


.93 


3 


21 


.2 




1.08 


1.41 


1.27 


1.28 


.95 


4 


15 


.2 




1.46 


1.66 


1.34 


1.08 


1.29 


5 


15 


.2 




1.66 


1.56 


1.17 


1.14 


1.06 


6 


41 


.2 




1.63 


1.06 


1.18 


.92 


1.18 


7 


21 


.05 


1.12 


1.19 


1.03 


.89 






8 


21 


.4 






3.86 


2.07 


1.38 


1.31 


O 


Ratio 


of observed variance using y 


in estimation to asymptotic 




variance with 


y t 












Parameter 


set 








y e 






No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.2 




1.16 


1.45 


1.21 


1.19 


1.35 


2 


21 


.2 




1.72 


1.23 


1.14 


1.39 


1.17 


3 


21 


.2 




.89 


1.24 


1.27 


1.47 


1.24 


4 


15 


.2 




1.23 


1.49 


1.34 


1.21 


1.63 


5 


15 


.2 




1.37 


1.38 


1.17 


1.29 


1.37 


6 


41 


.2 




1.37 


.95 


1.18 


1.03 


1.52 


7 


21 


.05 


1.05 


1.19 


1.09 


1.06 






8 


21 


.40 






2.73 


1.64 


1.22 


1.31 
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Table 6 (Continued) 
Properties of 



d) Bias in 6!^ when y^ is used in estimation 



v 

Parameter set 'E 



No. 


n 


Y X 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.20 




.548 


.532 


.066 


-.281 


-.540 


2 


21 


.20 




.477 


.268 


.021 


-.166 


-.157 


3 


21 


.20 




.749 


.200 


.135 


-.415 


-.625 


4 


15 


.20 




.493 


.365 


.033 


-.330 


-.212 


5 


15 


.20 




.117 


.099 


.007 


-.044 


-.103 


6 


41 


.20 




1.354 


.691 


.072 


-.595 


-.947 


7 


21 


.05 


.024 


-.047 


.012 


-.338 






8 


21 


.40 






2.111 


.931 


.315 


.056 


e) 


Squared bias 


in «HL 


as a percent of 


the asymptotic variance using 




y t 














Parameter 


set 






y 


E 






No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


. 40 


1 


21 


.2 




14 


13 


0 


4 


14 


2 


21 


.2 




50 


13 


0 


5 


5 


3 


21 


.2 




25 


2 


1 


8 


17 


4 


15 


.2 




16 


8 


0 


6 


3 


5 


15 


.2 




18 


14 


0 


3 


14 


6 


41 


.2 




42 


11 


0 


3 


21 


7 


21 


.05 


0 


0 


0 


6 






8 


21 


.40 






162 


31 


4 


1 



O 
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Table 7 

A 

Properties of 8^ 

£ 2 

a) Ratio of asymptotic variance of to a 

Parameter set Y 



No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.2 


.1314 


.1354 


.1402 


.1505 


.1631 


.1795 


2 


21 


.2 


.1314 


.1354 


.1402 


.1505 


.1631 


.1795 


3 


21 


.2 


.1327 


.1409 


.1500 


.1684 


.1893 


.2148 


4 


13 


.2 


.1770 


.1823 


.1886 


.2024 


.2194 


.2414 


5 


15 


.2 


.2389 


.2468 


.2559 


.2752 


.2983 


.3276 


6 


41 


.2 


.0705 


.0728 


.0754 


.0809 


.0878 


.0966 


7 


21 


.05 


.1314 


.1354 


.1402 


.1505 


.1631 


.1795 


8 


21 


.4 


.1314 


.1354 


.1402 


.1505 


.1631 


.1795 


b) 


Ratio 


of observed variance of 


L, using y 


to asymptotic variance 




using 
















Parameter 


set 






Y 








No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


,4D 


1 


21 


.2 




1.60 


1.34 


.97 


1.11 


1.20 


2 


21 


.2 




2.34 


1.52 


1.05 


1.31 


.88 


3 


21 


.2 




1.24 


1.36 


1.10 


1.30 


.89 


4 


15 


.2 




2.04 


1.88 


1.49 


1.11 


1.17 


5 


15 


.2 




1.95 


1.56 


1.14 


1.18 


1.34 


6 


41 


.2 




1.36 


1.37 


1.09 


.81 


.87 


7 


21 


.05 


1.14 


1.15 


.89 


1.16 






8 


21 


,4 






4.52 


1.88 


1.45 


1.43 



c) Ratio of observed variance when Yg used in estimation to asymptotic 
variance using Y T 



Parameter set 



O 

ERIC 



No. 


n 


Y x 


.01 


.05 


.10 


.20 


.30 


.40 


i 


21 


.2 




1.44 


1.25 


.97 


1.21 


1.43 


2 


21 


.2 




2.12 


1.42 


1.05 


1.43 


1.05 


3 


21 


.2 




1.02 


1.21 


1.10 


1.47 


1.14 


4 


15 


.2 




1.33 


1.74 


1.49 


1.20 


1.39 


5 


15 


.2 




1.76 


1.45 


1.14 


1.27 


1.59 


6 


41 


.2 




1.22 


1.28 


1.09 


.88 


1.04 


7 


21 


.05 


1.10 


1.15 


.92 


1.28 






8 


21 


.4 






3.50 


1.56 


1.31 


1.43 
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Table 7 (Continued) 



d) 



Properties of 




Bias in 



^ML When Y E 



used in estimation 



Parameter set ^E 



No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.20 




.0445 


.0151 


-.0007 


.0095 


-.0363 


2 


21 


.20 




• 0444 


.0160 


.0075 


.0108 


.0125 


3 


21 


.20 




.0608 


.0348 


.0335 


-.0540 


.0560 


4 


15 


.20 




.0434 


.0631 


.0332 


-.0189 


-.0473 


5 


15 


.20 




.0507 


.0187 


.0038 


-.0091 


-.0414 


6 


41 


.20 




.0226 


.0308 


-.0035 


-.0061 


-.0171 


7 


21 


.05 


.0009 


.0236 


-.0266 


- .0027 






8 


21 


.40 






.2201 


.0716 


.0283 


.0186 


e) 


Squared bias 


ln ^ML 


as a percent of 


the asymptotic variance usi 


Parameter 


set 




y e 










No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.2 




4 


0 


0 


0 


0 


2 


21 


.2 




15 


2 


1 


1 


0 


3 


21 


.2 




5 


2 


2 


5 


3 


4 


15 


.2 




3 


6 


2 


0 


3 


5 


15 


.2 




4 


0 


0 


0 


2 


6 


41 


.2 




2 


3 


0 


0 


1 


7 


21 


.05 


0 


1 


1 


0 






8 


21 


.4 






74 


8 


2 


1 
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Table 8 
Properties of 

a 2 

a) Ratio of asymptotic variance of to o 

Parameter set Y 



No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.2 


2.0367 


.3754 


.1860 


.0974 


.0713 


.0609 


2 


21 


.2 


2.0367 


.3754 


.1860 


.0974 


.0713 


.0609 


3 


21 


.2 


4.0500 


.6515 


.3046 


.1490 


.1040 


.0859 


4 


15 


.2 


2.7203 


.5037 


.2502 


.1315 


.0963 


.0823 


5 


15 


.2 


3.6619 


.6705 


.3309 


.1721 


.1252 


.1065 


6 


41 


.2 


1.1051 


.2024 


.1000 


.0522 


.0381 


.0325 


7 


21 


.05 


2.0367 


.3754 


.1860 


.0974 


.0713 


.0609 


8 


21 


.4 


2.0367 


.3754 


.1860 


.0974 


.0713 


.0609 


b) 


Ratio 


of observed variance using y^ 


to asymptotic variance usin 


Parameter 


set 








Y 






No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.2 




1.07 


2.90 


4.47 


5.30 


8.55 


2 


21 


.2 




1.29 


2.46 


5.25 


6.49 


6.72 


3 


21 


.2 




.55 


1.04 


1.95 


6.07 


2.74 


4 


15 


.2 




1.46 


1.89 


6.10 


5.30 


7.58 


5 


15 


.2 




1.29 


2.36 


3.78 


5.C 8 


4.80 


6 


41 


.2 




.99 


1.50 


2.97 


5.95 


8.41 


7 


21 


.05 


.55 


2.77 


6.12 


9.90 






8 


21 


.4 






1.24 


1.19 


1.37 


3.32 


c) 


Ratio 


of observed variance \,hen y„ 


used in estimation 1 


to asympt 




variance using Y T 




U 








Parameter 


set 








Y 






No. 


n 


Y t 


,01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.2 




4.50 


5.51 


4.47 


3.86 


5.33 


2 


21 


.2 




4.97 


4.69 


5.25 


4.74 


4.20 


3 


21 


.2 




2.41 


2.31 


1.95 


4.24 


1.58 


4 


15 


.2 




5.57 


3.61 


6.10 


3.88 


4.74 


5 


15 


.2 




5.04 


4.54 


3.78 


3.70 


2.97 


6 


41 


.2 




3.84 


2.87 


2.97 


4.35 


5.24 


7 


21 


.05 


3.00 


2.77 


3.03 


2.56 






8 


21 


.4 






3.79 


1.90 


1.60 


3.32 



O 

ERIC 
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Table 8 (Continued) 



Properties of 



C ML 



d) 



Bias in 



c^ when Yj? used in estimation 



Parameter set 



y e 



No. 


n 


y t 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.20 




-.0071 


-.0131 


-.0087 


-.0116 


-.0149 


2 


21 


.20 




-.0025 


-.0026 


-.0028 


-.0081 


-.0087 


3 


21 


.20 




-.0072 


-.0018 


-.0071 


.0050 


-.0082 


4 


15 


.20 




-.0121 


-.0217 


-.0138 


-.0158 


-.0295 


5 


15 


.20 




-.0525 


-.0493 


-.0570 


-.0397 


-.0832 


6 


41 


.20 




-.0017 


.0015 


-.0018 


-.0042 


-.0067 


7 


21 


.05 


-.0347 


-.0364 


-.0363 


-.0339 






8 


21 


.40 






-.0072 


-.0021 


-.0003 


-.0032 


e) 


Squared bias 


ln S*L 


as a percent of 


the asymptotic variance usi 


Parameter 


set 






y e 








No. 


n 


y T 


.01 


.05 


.10 


.20 


.30 


.40 


1 


21 


.2 




14 


49 


22 


38 


63 


2 


21 


.2 




7 


8 


9 


75 


87 


3 


21 


.2 




10 


1 


9 


4 


13 


4 


15 


.2 




15 


46 


19 


24 


86 


5 


15 


.2 




17 


15 


20 


10 


43 


6 


41 


.2 




6 


5 


7 


39 


99 


7 


21 


.05 


88 


98 


98 


75 






8 


21 


.4 






24 


2 


0 


4 
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Figures 2-4. These distributions appear reasonably symmetric and 
well-behaved, especially the distribution of 8^ . Note the second 
peak at zero in the distribution of c . 

In summary then, for y known, and 3^ appear to behave 

well for samples as small as 21. The bias is not large and the asymp- 
totic variance formula if inflated by 20 to 50% could reasonably be used 
to provide some estimates of precision. The estimator of c performs 
poorly by contrast, it is an underestimate on the average and much more 
variable than asymptotic results would indicate. This is hardly 
surprising oince the effective sample size for the estimation of c is 
of the order of . 

Robustness to Inaccurate Specification of y 

a ft 

Now that we have assessed the small sample behavior of • p^ l * 

and when the true y is known we need to evaluate how misled we 

will be if the wrong value of y is used in the estimation procedure. 

For each parameter set we have run sets of 200 samples when the value of 
y used in the estimation procedure, y_ , is not equal co the true y 
value, y T . Changes in the observed variance and observed bias due to 
inaccurate specification of y are shown in Tables 6, 7, 8. 

For 3^ we note that inaccurate specification of y does not 
seem to have an appreciable effect on the size of the variance. Table 7t 
shows that the ratio of the observed to the asymptotic variance using y^, 
was generally less than 1.7. The bias in is much more directly 

affected by y^ ; it is quite close to zero when Yg * Yj i becoming 
moderate and positive for Yg < y^ (y underestimated), and moderate and 
negative for y E > y T (y overestimated). That is, underestimation of 
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y leads to overestimation of a and overestimation of y leads to 
underestimation of a . Although using a y of ,05 or .40 when the 
true value is .20 is quite a large error, the bias at these extremes 
generally contributes only about 20% to the MSE. 

The situation for 3^ is very similar although inaccurate 
specification of y 6eems to have a somewhat larger effect on the 
variance. Again, the bias tends to switch from positive to negative as 
we go from underestimation to overestimation of y. however, the bias 
is generally of negligible size even at the extremes. 

For , inaccurate specification of y does not exhibit any 

appreciable tendency to inflate the variance. The observed variance is 
much more strongly influenced by the value of y^ than by y^ , tending 
to be comparatively stable across a row. The effect on the size of the 
consistently negative bias is variable, with overestimation of y 
considerably worse than underestimation. 

In summary then, even for relatively small samples the maximum 
likelihood estimators of a, 3, c are robust to Inaccurate specification 
of y . Their variances are only moderately affected by differences 

A 

between y g and y^ , and bias becomes a serious problem only for c^ 
when y is overestimated. 
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COMPARISON OF MAXIMUM LIKELIHOOD ESTIMATORS AND ORDINARY I E AST 
SQUARES ESTIMATORS OF a, g 

How much can we generally expect to gain by using the maximum 

A A 

likelihood estimators of a, g rather than the ordinary least squares 
estimators whose computation Ignores the presence of outliers. The 
ordinary least squares estimators are given by 

A 

“la = y 

A £(x 1 - x)( yi - y) 

3 ls " ~ —72 

£(x a - x) 

2 Z(y ± - 7 - e l8 (x t - x)) 2 

°la “ n-2 

To derive the expected values and variances of these estimators under 
the quadratic outlier model we note that 

E(y A ) ■ a + - x) + yc (x A - m ) 2 



and 



Var (y t ) - o 2 + y(l-y)c 2 ( x ± - m)* . 
Thus under the outlier model 

E(S ls ) - o + E(x t - m ) 2 



o 

ERIC 



V flr (a l8 ) - “ + C 2 £(x a - m)' 
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E(§ ls ) = e + yc 



E(x^ - xXx^ - m) 
E(x a - x ) 2 



2 



Var (P l8 ) 



+ c y(l - Y) 

E(x A -x) 



E(x a - x ) 2 (x t - m ) 4 



[E(x. 



-•> 2,2 
x) ] 



2 2 2 
E(o 2 s ) - o 2 + JSj E( Xl -m) 4 - + YlXx^m) 2 ] ] 

2 2 

^7 [ (1~y) ^ (Xj -x) 2 (x . -m) 4 + y(E(x.-x)(x -m) 2 ] ] 

E(x r x)Xn-2) 11 11 

The estimators of a and B are inflated by terms in yc and the x's, 

2 

their variances are increased by terms in c y(l-y) and the x’s. 

The maximum improvement obtainable from using the maximum likelihood 

MSE a ls USE $ ls 

estimators can be assessed by looking at and — where the 

MSE V MSE ^ML 

asymptotic formulas for the ML estimators are used. (Since ct^ , ^ML 
are asymptotically unbiased, MSE ■ var .) Calculations for a « 0 , 

& » 1 , equally spaced x’s between -1.0 and +1.0 , m ■ -1.0 , are 
displayed for several values of y , n , c in Table 9. Improvement from 
using the ML estimators is rapid with increases in y , c , n . For n - 21 , 
y ■ .2 , f • 60 , the mean squared error using the least squares estimators 
is almost five tines that using the maximum likelihood estimators. 

Tb* ratios of mean squared errors observed in the Monte Carlo study 
are shown in Tables 10 and 11. Although the observed advantage of the 
3 maximum likelihood estimators of a and & is less than indicated 

ERJC 
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Table 9 



MSE (a LS ) 

a) Asymptotic formulas for ^ — 



y = .20 



c(x - X . ) 
max min 



«W 



n * 21 



.06 



.5b 



2a 



3a 



4o 



5a 



1.11 



1.67 



2.39 



3.34 



c(x - x , ) = 6o 

v max rain' 



21 

41 



2.26 

2.90 



4.52 

6.97 



MSE (0) 

b) Asymptotic formulas for ^g£ ) 

y = .20 

c(x - x . ) 
max rain 



21 



.07 



.62 



2o 



1.14 



3o 



1.63 



4a 



5a 



2.47 



3.59 



c(x - X . ) 
max min 



60 



- Y 


.1 


.2 


n 






21 


2.57 


4.96 


41 


3.08 


6.95 



O 
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6a 



4.52 



6o 



4.96 
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Table 10 



Observed 



MSE \s 



y 

Parameter set 'E 

No. n Y T .01 .05 .10 .20 .30 



1 


21 


.2 


3.09 


3.34 


3.90 


3.45 


2 


21 


.2 


2.34 


3.68 


4.01 


2.70 


3 


21 


.2 


1.39 


1.37 


1.36 


1.08 


4 


15 


.2 


2.47 


2.52 


3.05 


2.64 


5 


15 


.2 


1.87 


2.24 


2.48 


2.61 


6 


41 


.2 


3.52 


6.78 


8.31 


6.06 


7 


21 


.05 1.24 


1.22 


1.44 


1.20 




8 


21 


.4 




2.68 


5.67 


8.72 



Parameter 
No. n 


set 

Y T .01 


Table 11 

MSE 0 Lg 

observed msfb^; 
y e 

.05 .10 .20 


.30 


1 


21 


.2 


3.02 


4.09 


4.78 


4.28 


2 


21 


.2 


2.55 


3.87 


4.76 


3.47 


3 


21 


.2 


1.28 


1.36 


1.68 


1.12 


4 


15 


.2 


2*37 


2.77 


3.44 


3.53 


5 


15 


.2 


2.20 


2.35 


2.92 


2.69 


6 


41 


.2 


4,36 


5.92 


6.33 


8.06 


7 


21 


.05 1.38 


1.61 


1.28 


1.48 




8 


21 


.4 




2.86 


6.77 


8.81 
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.40 



2.67 

3.96 

1.04 

2.30 

2.23 

4.01 

7.95 



.40 



3.36 

4.94 

1.39 

2.77 

2.74 

6.26 

7.46 
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by asymptotic results, it is still considerable. The MSE usin*- standard 
least squares is at least 2.4 that using the maximum likelihood estima- 
tors with the true y for all but the cases with f ■ 3a and y = .05 . 
The maximum likelihood estimators still perform better than the least 
squares estimators even when the estimated y is way off. The advantage 
of the maximum likelihood estimators increases rapidly with small 
increases in sample size. 

Comparisons were al6o made between the maximum likelihood 
estimators and the ordinary least jquares estimators for a quadratic 
regression. However, for x's symmetric about x , the two least squares 
estimators are identical and MSE a was little different in the two 
situations. 
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APPLICATIONS 

I became interested in the problem of outliers in regression when I 
undertook with Professor Richard Snow at Stanford a reanalysis of the 
data reported on by Rosenthal and Jacobson in their book Pygmalion in 
the Classroom . All the children in a particular grade school were given 
a preliminary I.Q. test. Then, one-firth of the children were selected at 
random and their teachers told that these experimental children were 
expected to bloom intellectually very soon. Months later all the 
children, both experimental and control groups, were retested with the 
same IQ test. One way to assess differences between the two groups is 
to compare the regression of posttest IQ on pretest IQ. However, we soon 
found that although a straight line seemed to describe the majority of 
children fairly well, some children had excessively high IQ's on the 
retest. 

Look at Figure 5 which shows pre and post Total IQ scores for the 19 
experimental group children in the first and second grades. One child with 
a pretest IQ score of 139 has a posttest IQ score of 202. Figure 6 shows 
the Verbal IQ results for the third- and fourth-grade experimental group. 
Figure 7 shows the Verbal IQ results for the fifth- and sixth-grade experi- 
mental group. Other similar patterns appear for other groups in the experi- 
ment; except for the first- and second-grade Reasoning sub test, which has 
some excessively low pretest scores, the general picture is the same for 
Verbal and Reasoning subtests for all grades. Most of the points seem to 
lie on a straight line, but some children with high pretest scores have 
excessively high posttest scores. Thus we have a problem where outliers 




seem to occur only for high values of x . 
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Figure 5: Pre and Post Total IQ scores for 19 experimental group children 




in grades 1 & 2 (Note that both scales start at 50.) 
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Figure 6: Pre and Post Verbal IQ scores for 26 experimental group children 

in grades 364 (Note that both scales start at 50,) 
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Figure 7: Pre and Post Verbal IQ scores for 23 experimental group children 

in grades 5 & 6 (Note that both scales start at 50.) 
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In this problem the apparent outliers may be partially due to the IQ 
transformation of the raw scores. At the extremes of the score distribu- 
tion, one question right or wrong can make many points difference in IQ. 
However, the raw scores were no longer available to us, and it is the IQ 
scores which generally receive the psychological interpretation. 

We then applied our outlier model to the estimation of a and 8 
for three sets of data from the Rosenthal experiments. Tables 12, 13, 
and 14 show the results for grades 1 and 2, grades 3 and 4, and grades 
5 and 6, the data shown in the scatterplots . The Iterative solutions of 
the maximum likelihood equations converged to at least six significant 
digits after 20-25 Iterations. 

Look first at Table 12. When standard least squares was used we 

2 

obtained a ■ 117 , S ■ .93 and s* » 376. When the one "obvious 
outlier" was removed a = 112 , § = .58 and s^ e 159 using standard 
least squares. The maximum likelihood estimates obtained with y = .05 
are a = 113 , B - .58 , s 2 = 141 , c = .0126 . Notice that these 
estimates change very little for values of y ranging from .01 to .20, 
and how similar they are to those obtained by deleting the outlier and 
using standard least squares. The estimate of c is very close to that 
obtained by fitting the bias term through the outlier point. We also 
tried y « .001 and obtained a » 116.4 , 8 ° .90 , s^ - 339 — very 

similar to standard least squares on all the data. 

For grades 3 and 4 the data resemble the grades 1 and 2 data with 
one outlier, but there are two y points near the line for very large 
x; that is, the basic line appears better defined, and the outlier is 
farther out. Here even for y “ .001 the results were very little 
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affected by choice of y and resembled those for standard least squares 
with the outlier deleted. 

Our results for grades 5 and 6 were very similar. The choice of 

y had little effect on the estimates of a, 0, c for y » .01 to 
2 

.30; s was most affected. For y * .001 results were close to the 
standard least squares on all the data. These data do not look like a 
one-outlier problem and the results obtained using our method do not 
resemble those obtained by deleting one outlier. These data look much 
more like our second interpretation — a mixture of linear and quadratic 
regression. 

Our estimates of c were similar for all three pieces of data: 
.0126, .0112, and .0102. 

In conclusion the model seems general enough to represent many 
outlier problems. Choice of y in any reasonable range seems to make 
little difference in the estimates. The data seem to dominate the 
specification of y . Use of this model has reflected well our intui- 
tive Impressions of the data. 

In general it seems desirable to fit a model which describes all 
the data well — outliers and all — and regression problems with outliers 
dependent on the x value could use considerable investigation. 
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Table 12 

Regression Analyses for Grades 1 & 2 
Experimental Group Total IQ 1 & 3, N = 19 



Standard Least Squares 





A 

a 


A 

8 


s 






y-x 


All Data 


116.7 


.93 


19.39 


Outlier reduced 
from 202 to 160 


114.5 


.71 


13.48 


Outlier deleted 


112.0 


.58 


12.63 



Maximum Likelihood 


Estimates Under 


Outlier Model 


(m = 60) 


Y 


As, 

a 




2 

3 


As. 

c 


.001 


116.4 


.8997 


338.96 


.0098 


.01 


113.13 


.5771 


141.59 


.0127 


.05 


112.92 


.5785 


141.49 


.0126 


.10 


112.65 


.5796 


142.13 


.0124 


o 

CM 

. 


111.97 


.5782 


146.13 


.0120 


Outlier 

deleted 










Y - .05 


111.82 


.5698 


150.01 


.0024 



Solutions converged to 6 significant figures after about 20 iterations. 
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Table 13 

Regression Analyses for Grades 3 & 4 
Experimental Group Verbal IQ 1 & 3, N * 26 



Standard Least Squares 





A 

a 




s 






y*x 


All Data 


115.65 


1.07 


26.92 


1 Outlier deleted 


109.60 


.70 


13.85 



Maximum Likelihood 


Estimates Under 


Outlier Model 


(m * 60) 




A 


a 


2 


A 


Y 


a 


B 


8 


c 


.001 


11 








.01 


110.97 


.7052 


191.52 


.0113 


.05 


110.70 


.7123 


191.48 


.0112 


.10 


110.35 


.7214 


191.66 


.0112 


.20 


109.59 


.7405 


192.92 


.0110 


1 Outlier 










deleted 










.05 


10*. 41 


• * .6708* “ 


' 1?0.93 


* .0022 




t 
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Table 14 

Regression Analyses for Grades 5 & 6 
Experimental Group Verbal IQ 1 & 3, N = 23 



Standard Least Squares 







a 


A 

8 


s 

y-x 


All Data 




115.35 


1.14 


20.8 


1 Outlier 


deleted 


110.73 


.98 


15.9 


Maximum Likelihood 


Estimates Under 


Outlier Model 


(m = 60) 


Y 


A 

a 


s 


2 

5 


✓V 

c 


.001 


115.24 


1.137 


408.7 


.007 


.01 


108.65 


.8455 


97.40 


.0102 


.05 


108.27 


.8444 


93.07 


.0102 


.10 


107.83 


.8433 


88.14 


.0103 


.20 


106.99 


.84 


81.14 


.0103 


.30 


106.31 


.8405 


76.13 


.0104 


.60 


105.0 


.8468 


71.66 


.0105 


.70 


104.6 


.8503 


71.95 


.0105 




* * - 


*■ . . 


— — m * • • 


^ - - 


.80 


104.3 


.8544 


73.25 


.0105 


.0 


114.4 


1.136 


414 




1 Outlier 
deleted 








i 


.05 


107.72 


| .8459 


95.64 


.0112 



i 




2 

s 

432.6 

252.8 
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( CONCLUSIONS 

V 

r 

] 

j We have proposed a model describing outliers in a linear regression 

I 

problem, derived the maximum likelihood estimators of the parameters, 
and examined the asymptotic properties of. the estimators. We have 
examined the behavior of the estimators In small samples and their 
robustness to inaccurate specification of y * We have applied the 
model to some real data. 

Although this particular quadratic outlier model was suggested by 
some real data and seems to be useful In the analysis of that data, the 
importance of the paper does not lie in this particular model but in 
the demonstration that models of this kind can be useful, that the 
asymptotic properties provide not unreasonable indications of behavior 
for samples as small as 21, and that the procedure may be quite robust 
to inaccurate specification of y . Thus, building models to describe 
outliers and estimating the parameters of these models provides an 
Interesting alternative to procedures of outlier detection followed by 
ordinary least squares procedures. 
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